A Text Corpora-Based Estimation of the Familiarity of Health Terminology

نویسندگان

  • Qing Zeng-Treitler
  • Eunjung Kim
  • Jonathan Crowell
  • Tony Tse
چکیده

In a pilot effort to improve health communication we created a method for measuring the familiarity of various medical terms. To obtain term familiarity data, we recruited 21 volunteers who agreed to take medical terminology quizzes containing 68 terms. We then created predictive models for familiarly based on term occurrence in text corpora and reader’s demographics. Although the sample size was small, our preliminary results indicate that predicting the familiarity of medical terms based on an analysis of the frequency in text corpora is feasible. Further, individualized familiarity assessment is feasible when demographic features are included as predictors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Political Terms by APLL: Issues of Terminology Implantation and ‎Acceptability

The present study investigates the implantation of political science terminology approved by the Academy of Persian Language and Literature (APLL) in the Hamshahri corpus made up of news text from Hamshahri newspaper and their acceptability among MA students of English translation studies (ETS), English literature (EL), and Political science (PS). To conduct this research the frequencies of the...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Relationship Between Hospital Cost Based on Current Procedural Terminology and Out-of-Pocket Payment of Oil Company Retirees

 Relationship Between Hospital Cost Based on Current Procedural Terminology and Out-of-Pocket Payment of Oil Company Retirees Marzie Afshoon 1, Leila Riahi 2*, Leila Nazarimanesh 3 1 Department of Health Services Management, Science and Research Branch, Islamic Azad University, Tehran, Iran Abstract Introduction: This study aimed to investigate the relationship between hospital cost based ...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005